Provable Super-Convergence With a Large Cyclical Learning Rate

نویسندگان

چکیده

Conventional wisdom dictates that learning rate should be in the stable regime so gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large scheme leads to super fast convergence, with convergence depending only logarithmically on condition number of problem. Our uses Cyclical Learning Rate we periodically take one unstable step and several small steps compensate for instability. These findings also help explain empirical observations [Smith Topin, 2019] they show CLR maximum can dramatically accelerate lead so-called “super-convergence”. We prove our excels problems Hessian exhibits bimodal spectrum eigenvalues grouped into two clusters (small large). The is key enabling over eigen-spectrum.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Randomized Asynchronous Linear Solver with Provable Convergence Rate

Asynchronous methods for solving systems of linear equations have been researched since Chazan and Miranker published their pioneering paper on chaotic relaxation in 1969. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to work and make progress even if not all progress made by other processors has been communicated to them. His...

متن کامل

Convergence of Gradient Dynamics with a Variable Learning Rate

As multiagent environments become more prevalent we need to understand how this changes the agent-based paradigm. One aspect that is heavily affected by the presence of multiple agents is learning. Traditional learning algorithms have core assumptions, such as Markovian transitions, which are violated in these environments. Yet, understanding the behavior of learning algorithms in these domains...

متن کامل

Super-convergence: Very Fast Training of Residual Networks Using Large Learning Rates

In this paper, we show a phenomenon, which we named “super-convergence”, where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. One of the key elements of superconvergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates im...

متن کامل

Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates

In this paper, we show a phenomenon where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods, which we named “superconvergence.” One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates imp...

متن کامل

A Collocation Method for Integral Equations with Super-Algebraic Convergence Rate

We consider biperiodic integral equations of the second kind with weakly singular kernels such as they arise in boundary integral equation methods. The equations are solved numerically using a collocation scheme based on trigonometric polynomials. The weak singularity is removed by a local change to polar coordinates. The resulting operators have smooth kernels and are discretized using the ten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2021

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2021.3101131